Speaker recognition with recurrent neural networks
نویسندگان
چکیده
We report on the application of recurrent neural nets in a openset text-dependent speaker identification task. The motivation for applying recurrent neural nets to this domain is to find out if their ability to take short-term spectral features but yet respond to long-term temporal events is advantageous for speaker identification. We use a feedforward net architecture adapted from that introduced by Robinson et.al. We introduce a fully-connected hidden layer between the input and state nodes and the output. We show that this hidden layer makes the learning of complex classification tasks more efficient. Training uses back propagation through time. There is one output unit per speaker, with the training targets corresponding to speaker identity. For 12 speakers (a mixture of male and female) we obtain a true acceptance rate 100% with a false acceptance rate 4%. For 16 speakers these figures are 94% and 7% respectively. We also investigate the sensitivity of identification accuracy to environmental factors (signal level, change of microphone and band limitation), choice of acoustic vectors (FFT, LPC or Cepstral), distribution of speakers in the training database, inclusion of fundamental frequency. FFT features plus fundamental frequency give the best results. This performance is shown to compare favorably with studies reported on similar tasks with Hidden Markov Model technique.
منابع مشابه
شبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملRecurrent neural networks for phoneme recognition
This paper deals with recurrent neural networks of multilayer perceptron type which are well-suited for speech recognition, specially for phoneme recognition. The ability of these networks has been investigated by phoneme recognition experiments using a number of Japanese words uttered by a native male speaker in a quiet environment. Results of the experiments show that recognition rates achiev...
متن کاملCharles University in Prague Faculty of Mathematics and Physics University of Groningen Faculty of Arts MASTER THESIS Bich Ngoc Do
Speaker recognition is a challenging task and has applications in many areas, such as access control or forensic science. On the other hand, in recent years, the deep learning paradigm and its branch, deep neural networks have emerged as powerful machine learning techniques and achieved state-of-the-art performance in many fields of natural language processing and speech technology. Therefore, ...
متن کاملModeling speaker variability using long short-term memory networks for speech recognition
Speaker adaptation of deep neural networks (DNNs) based acoustic models is still a challenging area of research. Considering that long short-term memory (LSTM) recurrent neural networks (RNNs) have been successfully applied to many sequence prediction and sequence labeling tasks, we propose to use LSTM RNNs for modeling speaker variability in automatic speech recognition (ASR). Firstly, the LST...
متن کاملASR Confidence Estimation with Speaker-Adapted Recurrent Neural Networks
Confidence estimation for automatic speech recognition has been very recently improved by using Recurrent Neural Networks (RNNs), and also by speaker adaptation (on the basis of Conditional Random Fields). In this work, we explore how to obtain further improvements by combining RNNs and speaker adaptation. In particular, we explore different speakerdependent and -independent data representation...
متن کاملSpeaker Identification Using Modular Recurrent Neural Networks
This paper demonstrates a speaker identification system based on recurrent neural networks trained with the Real-time Recurrent Learning algorithm (RTRL). A series of speaker identification experiments based on isolated digits has been conducted. The database contains four utterances of ten digits spoken by ten speakers over a period of nine months. The results suggest that recurrent networks c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000